Learning with Lq<1 vs L1-Norm Regularisation with Exponentially Many Irrelevant Features
نویسندگان
چکیده
We study the use of fractional norms for regularisation in supervised learning from high dimensional data, in conditions of a large number of irrelevant features, focusing on logistic regression. We develop a variational method for parameter estimation, and show an equivalence between two approximations recently proposed in the statistics literature. Building on previous work by A.Ng, we show the fractional norm regularised logistic regression enjoys a sample complexity that grows logarithmically with the data dimensions and polynomially with the number of relevant dimensions. In addition, extensive empirical testing indicates that fractional-norm regularisation is more suitable than L1 in cases when the number of relevant features is very small, and works very well despite a large number of irrelevant features. 1 Lq<1-Regularised Logistic Regression Consider a training set of pairs z = {(xj , yj)}j=1 drawn i.i.d. from some unknown distribution P . xj ∈ R are m-dimensional input points and yj ∈ {−1, 1} are the associated target labels for these points. Given z, the aim in supervised learning is to learn a mapping from inputs to targets that is then able to predict the target values for previously unseen points that follow the same distribution as the training data. We are interested in problems with large number m of input features, of which only a few r << m are relevant to the target. In particular, we focus on a form of regularised logistic regression for this purpose:
منابع مشابه
Efficient L1/Lq Norm Regularization
Sparse learning has recently received increasing attention in many areas including machine learning, statistics, and applied mathematics. The mixed-norm regularization based on the l1/lq norm with q > 1 is attractive in many applications of regression and classification in that it facilitates group sparsity in the model. The resulting optimization problem is, however, challenging to solve due t...
متن کاملFeature selection, L1 vs. L2 regularization, and rotational invariance
We consider supervised learning in the presence of very many irrelevant features, and study two different regularization methods for preventing overfitting. Focusing on logistic regression, we show that using L1 regularization of the parameters, the sample complexity (i.e., the number of training examples required to learn “well,”) grows only logarithmically in the number of irrelevant features...
متن کاملEfficient Mixed-Norm Regularization: Algorithms and Safe Screening Methods
Sparse learning has recently received increasing attention in many areas including machine learning, statistics, and applied mathematics. The mixed-norm regularization based on the l1/lq norm with q > 1 is attractive in many applications of regression and classification in that it facilitates group sparsity in the model. The resulting optimization problem is, however, challenging to solve due t...
متن کاملLearning Robust Graph Regularisation for Subspace Clustering
Various subspace clustering methods have benefited from introducing a graph regularisation term in their objective functions. In this work, we identify two critical limitations of the graph regularisation term employed in existing subspace clustering models and provide solutions for both of them. First, the squared l2-norm used in the existing term is replaced by a l1-norm term to make the regu...
متن کاملA Dirty Model for Multi-task Learning
We consider multi-task learning in the setting of multiple linear regression, and where some relevant features could be shared across the tasks. Recent research has studied the use of l1/lq norm block-regularizations with q > 1 for such blocksparse structured problems, establishing strong guarantees on recovery even under high-dimensional scaling where the number of features scale with the numb...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008